Annotation Method of Arbitrary-Oriented Rectangular Bounding Box

ABSTRACT

Disclosed in the present invention is An annotation method of arbitrary-oriented rectangular bounding box, wherein: the elements for annotation being: the coordinates of the center point C, a vector {right arrow over (CD)} formed by the center point C and a chosen vertex D, and the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}, where {right arrow over (CP)} is the projection of the vector {right arrow over (CE)} to {right arrow over (CD)}, and {right arrow over (CE)} is a vector formed by the center of the bounding box to one of the vertex E that close neighbor to vertex D; and it is also required that the vector {right arrow over (CP)} is in the same direction as the vector {right arrow over (CD)}, the vertex E in either of the clockwise or counterclockwise direction of the vertex D. The symbol notation of this method is (x c , y c , u, v, ρ), x c  and y c  are the two coordinate values of the center point C, u and v are the two components of vector {right arrow over (CD)}, ρ is the ratio of the vector {right arrow over (CP)} to vector {right arrow over (CD)}. Also let a binary value s to indicate whether the two components of the vector {right arrow over (CD)} have same sign or not to represent {right arrow over (CD)} and −{right arrow over (CD)} at once by (|u|, |v|, s), then getting a method for annotating arbitrary-oriented rectangular bounding box that one bounding box has only two representation vectors. Its symbol notation is (x c , y c , |u|, |v|, s, ρ), wherein |u| and |v| are magnitude of two components of the vector {right arrow over (CD)}. This method avoids loss inconsistency between representations of the same bounding box and is beneficial to model regression training.

CROSS-REFERENCE TO RELATED APPLICATION

The present application is a continuation of International ApplicationNo. PCT/CN2020/079379, filed on Mar. 14, 2020, the content of which ishereby incorporated by reference in its entirety.

TECHNICAL FIELD

The present invention relates to object detection and trackingalgorithms in computer vision, especially for supervised-learning-basedobject detection and tracking algorithms. The method from this inventionis one of the bounding box annotation methods in object detection andtracking algorithms. This rectangular bounding box annotation method canbe used for bounding box output at predicting, taking as anchor boxesand annotating sample images.

BACKGROUND ART

Object detection and tracking algorithms are of great value and havealways been hot research topics. Recently, most often used bounding boxis axis-aligned rectangular, it is annotated by the center point, widthand height. There are several methods for annotating arbitrary-orientedrectangular bounding box. First one is the most commonly used techniquewhich is axis-aligned rectangular with an additional angle value tox-axis or y-axis. The second method is from the thesis EAST An Efficientand Accurate Scene Text Detector (DOI: 10.1109/CVPR.2017.283), whichuses the distances from the center to four edges of rectangular and arotation angle. The third is listing the coordinates of four vertexes,which is also commonly used. This method can represent arbitraryquadrilateral, but has three redundancy variables for representingrectangular. The fourth, taking the first two vertexes of clockwise-lyordered four vertexes of rectangular and the distance from the secondvertex to the third vertex, reference from R ² CNN: Rotational RegionCNN for Orientation Robust Scene Text Detection. The fifth, using theparameters of axis-aligned Minimum Enclosing Rectangle of the boundingbox and the gliding distances of the four vertexes between theaxis-aligned Minimum Enclosing Rectangle and the bounding box, referencefrom Gliding vertex on the horizontal bounding box for multi-orientedobject detection.

As to axis-aligned rectangular bounding box, the defects are obvious.Objects in aerial images are of large aspect ratio, arbitrary-orientedand densely-gathered. The intersection-over-union (IoU) betweenaxis-aligned rectangular bounding boxes cannot truly represent the IoUbetween objects themselves. This situation is particularly significantfor large vehicles in parking-lot and ships on harbor.

For the arbitrary-oriented bounding box annotated by axis-alignedrectangular bounding box with an additional angle value to x-axis ory-axis, when exchange the width and height and add 2kπ+π/2 to the angle,it's the same bounding box. Since one b-box has many numericalrepresentations, there are many kinds of differences between the highlysimilar bounding boxes, and the difference between these representationsmeans inconsistent loss of b-box regression, which adds difficulties totraining. More about the shortcomings of this method can refer toSCRDet: Towards More Robust Detection for Small, Cluttered and RotatedObjects. The essence of the second method and the first method is thesame. Replacing the width and height with the distances from the centerto four edges of rectangular does not change anything, it has the sameshortcomings.

Listing the coordinates of four vertexes can also leads to one boundingbox has many representation vectors. One method to avoid the problem issorting the vertexes by the coordinates, and the loss is calculatedbetween corresponding vertexes. For more information, refer to DOTA: ALarge-scale Dataset for Object Detection in Aerial Images. However, thiscan result in vector-component-misplacement, which means in onepropagation the loss is calculated between the first component ofprediction vector and the second component of target vector, but inanother propagation the loss is calculated between the first componentof prediction vector and the third component of target vector. Therandomly correspondence is not conducive to training. The forth methodis the third method with redundancy variables removed, therefore it alsoleads to the fact that one bounding box has many representation vectors.

The fifth method aimed to predict the axis-aligned Minimum EnclosingRectangle of the bounding box at first and then fine-tune to the realrotated bounding box. When predicting the axis-aligned Minimum EnclosingRectangle of bounding box, it serves as the target of the anchor box. Ifthe rotated bounding box needs to be precisely predicted, theaxis-aligned Minimum Enclosing Rectangle needs also be preciselypredicted. This method adds the number of predicting targets, therebyincrease the difficulties of prediction (regression). Thus it is notgood for training either.

SUMMARY OF THE INVENTION

In order to solve the problem of the inconsistent loss of b-boxregression and the difficulty of model regression encountered in theabove-mentioned technical, a new method for annotatingarbitrary-oriented rectangular bounding box is proposed, wherein

the elements for annotation being the coordinates of the center point C,a vector {right arrow over (CD)} formed by the center point C and achosen vertex D, and the ratio of the vector {right arrow over (CP)} tovector {right arrow over (CD)}, where {right arrow over (CP)} is theprojection of the vector {right arrow over (CE)} to {right arrow over(CD)}, and {right arrow over (CE)} is a vector formed by the center ofthe bounding box to one of the vertex E that close neighbor to vertex D;the vector {right arrow over (CP)} is in the same direction as thevector {right arrow over (CD)}, and the vertex E in either of theclockwise or counterclockwise direction of the vertex D; the symbolnotation of this method is (x_(c), y_(c), u, v, ρ), x_(c) and y_(c) arethe two coordinate values of the center point C, u and v are the twocomponents of vector {right arrow over (CD)}, ρ is the ratio of thevector {right arrow over (CP)} to vector {right arrow over (CD)}.

Of the method described above, there are only two representation vectorsof one bounding box. In other words, taking the opposite vector of{right arrow over (CD)} and leaving the rest unchanged is stillrepresents the same bounding box. Because only the vectors {right arrowover (CD)} of the two representations are in the opposite direction,they can be represented at once. Using a binary value s to indicatewhether the two components of the vector {right arrow over (CD)} are allpositive (or negative) or a positive and a negative (hereinafterreferred to same sign or different sign), then {right arrow over (CD)}and −{right arrow over (CD)} can be represented by (|u|, |v|, s) atonce, wherein |u| and |v| are magnitude of two components of the vector{right arrow over (CD)}. If the two components are of same sign, {rightarrow over (CD)} and −{right arrow over (CD)} are (|u|, |v|) and (−|u|,−|v|). If the two components are of different sign, {right arrow over(CD)} and −{right arrow over (CD)} are (−|u|, |v|) and (|u|, −|v|). Now,we can reduce the number of representation vectors of one bounding boxto one, its symbol notation is (x_(c), y_(c), |u|, |v|, s, ρ).

Thus a refinement version of this invention is using a binary value s toindicate whether the two components of the vector CD are all positive(or negative) or a positive and a negative, and making {right arrow over(CD)} and −{right arrow over (CD)} be represented by (|u|, |v|, s) atonce.

Advantageous effects of the present invention are that it avoids lossinconsistency between representations of the same bounding box and isbeneficial to model regression training. The present invention providesa method for annotating arbitrary-oriented rectangular bounding box thatone bounding box has only two representation vectors, and only the (u,v) of the two representations are opposite numbers. There left only onerepresentation vector, if using a binary value s to indicate whether thetwo components of the vector {right arrow over (CD)} have same sign ornot. The present invention avoids loss inconsistency and is beneficialto training. Other than that, the correspondence of components ofrepresentation vector does not need to be adjusted.

Hereinafter, the present invention will be described in detail withreference to the accompanying drawings and examples.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram showing an arbitrary-oriented bounding boxannotation method;

FIG. 2 is a schematic diagram showing the loss between predicted {rightarrow over (CD*)} and background truth {right arrow over (CD)}.

DETAILED DESCRIPTION OF THE INVENTION

In FIG. 1 , X represents coordinate axis in an image row direction, Yrepresents the coordinate axis in an image column direction, Crepresents a center point of the bounding box, D, E are some twovertexes of the bounding box, P represents the projection point of{right arrow over (CD)} on {right arrow over (CE)}.

In FIG. 2 , CD represents the vector from the center point of thebounding box to the vertex D, {right arrow over (CD*)} is the predictionof {right arrow over (CD)}, {right arrow over (CP)} is the projectionvector of {right arrow over (CD*)} on {right arrow over (CD)}, e_(p) isthe length of the difference vector of {right arrow over (CD*)} and{right arrow over (CP)}, e_(a) is the length difference between {rightarrow over (CD)} and {right arrow over (CP)}.

An annotation method of arbitrary-oriented rectangular bounding box thatused for taking as anchor boxes, annotating sample images and boundingbox output at predicting of target detection and tracking algorithm,wherein

the elements for annotation being the coordinates of the center point C,a vector {right arrow over (CD)} formed by the center point C and achosen vertex D, and the ratio of the vector {right arrow over (CP)} tovector {right arrow over (CD)}, where {right arrow over (CP)} is theprojection of the vector {right arrow over (CE)} to {right arrow over(CD)}, and {right arrow over (CE)} is a vector formed by the center ofthe bounding box to one of the vertex E that close neighbor to vertex D;the symbol notation of this method is (x_(c), y_(c), u, v, ρ), x_(c),and y_(c), are the two coordinate values of the center point C, u and vare the two components of vector {right arrow over (CD)}, ρ is the ratioof the vector {right arrow over (CP)} to vector {right arrow over (CD)}.

To reduce the number of representation vectors, the value range of ρrequired to be in [0,1), i.e. the vector {right arrow over (CP)} is inthe same direction as the vector {right arrow over (CD)}, the vertex Ein either of the clockwise or counterclockwise direction of the vertexD. With this constraint, there are only two representation vectors ofone bounding box. In other words, taking the opposite vector of {rightarrow over (CD)} and leaving the rest unchanged is still represents thesame bounding box.

Since one bounding box still has two representation vectors, means areneeded to avoids loss inconsistency, a loss function that produce thesame output of the prediction between the two representation vectorsshould be provided. Because only the vectors {right arrow over (CD)} ofthe two representations are in the opposite direction, letting the lossvalue of the prediction {right arrow over (CD*)} between {right arrowover (CD)} and −{right arrow over (CD)} be the same will achieve thegoal. Let {right arrow over (CP)} be the projection vector of {rightarrow over (CD*)} on {right arrow over (CD)}, then an available lossfunction can be:

|{right arrow over (CD*)}−{right arrow over (CP)}|+||{right arrow over(CD)}|−|{right arrow over (CP)}||

As shown in FIG. 2 , |{right arrow over (CD*)}−{right arrow over (CP)}|is the length of the difference vector of {right arrow over (CD*)} and{right arrow over (CP)}, ||{right arrow over (CD)}|−|{right arrow over(CP)}|| is the length difference between {right arrow over (CD)} and{right arrow over (CP)}.

Because only the vectors {right arrow over (CD)} of the tworepresentations are in the opposite direction, they can be representedat once. Using a binary value s to indicate whether the two componentsof the vector {right arrow over (CD)} are all positive (or negative) ora positive and a negative (hereinafter referred to same sign ordifferent sign), then {right arrow over (CD)} and −{right arrow over(CD)} can be represented by (|u|, |v|, s) at once, wherein |u| and |v|are magnitude of two components of the vector {right arrow over (CD)}.If the two components are of same sign, {right arrow over (CD)} and−{right arrow over (CD)} are (|u|, |v|) and (−|u|, −|v|). If the twocomponents are of different sign, {right arrow over (CD)} and −{rightarrow over (CD)} are (−|u|, |v|) and (|u|, −|v|). Now, we can reduce thenumber of representation vectors of one bounding box to one, its symbolnotation is (x_(c), Y_(c), |u|, |v|, s, ρ).

Since the representation vector has been reduced to one, the calculationof the loss will be more convenient. When predicting a target boxdirectly, the loss of x_(c), Y_(c), |u|, |v|, s, ρ can be calculated ina regression way, that is, the difference between values is directlycalculated, such as SmoothL1, L2, etc. The loss of s can be calculatedin a classified way, so that the model outputs two values for s,indicating the possibility of taking the same sign and the differentsign. If the value representing the same sign is bigger, the twocomponents are of same sign, otherwise the opposite. The loss functioncan be CorssEntropy, L2, etc.

When using the feature vector to predict the regression parameters ofthe anchor box to the target box, it is possible to artificiallystipulate that the anchor box of the same sign regress to the target boxof the same sign, and the anchor box of the different sign regress tothe target box of the different sign. Then there is no need to calculatethe loss of s.

When use this method to annotate axis-aligned rectangular b-box, we canfind that the two components of the vector {right arrow over (CD)} arethe half of the width and the height. So, let (u, v)=2{right arrow over(CD)} makes this method be compatible with the axis-aligned rectangularannotated by the center point, width and height.

With this annotation method, we can calculate the four vertexes ofrectangular by solve the following equations. The coordinates of {rightarrow over (CE)} is unknown, after {right arrow over (CE)} is solved thecoordinates of the vertexes can be calculated by doing addition andsubtraction of vectors.

$\left\{ {{\begin{matrix}{{\left( {\overset{\longrightarrow}{CE} - {\rho\overset{\longrightarrow}{CD}}} \right) \cdot \overset{\longrightarrow}{CD}} = 0} \\{{❘\overset{\longrightarrow}{CE}❘} = {❘\overset{\longrightarrow}{CD}❘}}\end{matrix}{s.t.\overset{\longrightarrow}{CE}} \times \overset{\longrightarrow}{CD}} \geq {0{or}\overset{\longrightarrow}{CE} \times \overset{\longrightarrow}{CD}} \leq 0} \right.$

Where the first equation means {right arrow over (EP)} is perpendicularto {right arrow over (CD)}, the second equation means the length of CEand CD are identical, the constraint means the vertex E in either of theclockwise or counterclockwise direction of the vertex D. Only one of{right arrow over (CE)}×{right arrow over (CD)}≥0 and {right arrow over(CE)}×{right arrow over (CD)}≤0 can be taken.

One embodiment thereof is: when annotating the sample image, the valueof x_(c), y_(c), |u|, |v| is normalized according to image width (w_(i))and height (h_(i)). For compatibility with the axis-aligned rectangularannotated by the center point, width and height, expand |u| and |v| by afactor of 2. Then the corresponding value of the target bounding box inthe annotated document is x_(c)/w_(i), y_(c)/h_(i), 2|u|/w_(i),2|v|/h_(i), d, ρ.

Another embodiment thereof is: When we artificially stipulate that theanchor box of the same sign regress to the target box of the same sign,and the anchor box of the different sign regress to the target box ofthe different sign. The regression parameters from the anchor box to thetarget box can be defined using the following formula:

t _(x)=(x* _(c) −x _(c) ^(a))/w _(a), t _(y)=(y* _(c) −y _(c) ^(a))/h_(a)

t _(u)=ln(|u|*/|u| ^(a)),t _(v)=ln(|v|*/|v| ^(a)),t _(ρ)=ln(ρ*/ρ^(a))

Wherein, x*_(c), y*_(c), |u|*, |v|* and ρ* are parameters of target box,x_(c) ^(a), y_(c) ^(a), |u|^(a), |v|^(a) and ρ^(a) are parameters ofpre-setting anchor box, t_(x), t_(y), t_(u), t_(v) and t_(ρ) are theregression parameters that transforms the anchor box into the targetbox, and is also the value that the model needs to output directly.

1. An annotation method of arbitrary-oriented rectangular bounding box,characterized in that the elements for annotation being: the coordinatesof the center point C, a vector {right arrow over (CD)} formed by thecenter point C and a chosen vertex D, and the ratio of the vector CP tovector {right arrow over (CD)}, where {right arrow over (CP)} is theprojection of the vector {right arrow over (CE)} to {right arrow over(CD)}, and {right arrow over (CE)} is a vector formed by the center ofthe bounding box to one of the vertex E that close neighbor to vertex D;the vector {right arrow over (CP)} is in the same direction as thevector {right arrow over (CD)}, and the vertex E in either of theclockwise or counterclockwise direction of the vertex D; the symbolnotation of this method is (x_(c), y_(c), u, v, ρ), x_(c) and y_(c) arethe two coordinate values of the center point C, u and v are the twocomponents of vector {right arrow over (CD)}, ρ is the ratio of thevector {right arrow over (CP)} to vector {right arrow over (CD)}.
 2. Theannotation method of arbitrary-oriented rectangular bounding boxaccording to claim 1, characterized in that: using a binary value s toindicate whether the two components of the vector {right arrow over(CD)} are all positive (or negative) or a positive and a negative, andmaking {right arrow over (CD)} and −{right arrow over (CD)} berepresented by (|u|, |v|, s) at once, which leads to on bounding box hasonly one representation vector; the symbol notation is (x_(c), y_(c),|u|, |v|, s, ρ) , wherein |u| and |v| are magnitude of two components ofthe vector {right arrow over (CD)}.
 3. The annotation method ofarbitrary-oriented rectangular bounding box according to claim 1,characterized in that: let (u, v)=2{right arrow over (CD)} makes thismethod compatible with the axis-aligned rectangular annotated by thecenter point, width and height, its symbol notation is (x_(c), y_(c),2|u|, 2|v|, s, ρ).
 4. The annotation method of arbitrary-orientedrectangular bounding box according claim 2, characterized in that: let(u, v)=2{right arrow over (CD)} makes this method compatible with theaxis-aligned rectangular annotated by the center point, width andheight, its symbol notation is (x_(c), y_(c), 2|u|, 2|v|, s, ρ).